{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "8986ea10-976f-4521-8be5-dcc9a39eed90",
   "metadata": {
    "tags": []
   },
   "source": [
    "# AutoICE - Future of Automatic Sea Ice Mapping \n",
    "This notebook will introduce you to the thematics and help explore and analyze the input and output data of the AutoICE challenge. It should be seen as a supplement to the challenge dataset manual and repeats some aspects. Two other notebooks are available, the quickstart and test_upload. The quickstart notebook contains an example of a pipeline to load the ready-to-train dataset, as well as training and testing of a 'simple' U-Net model. The test_upload notebook creates ice charts from the test scenes and compiles the output into a NetCDF file ready to upload to AI4EO.eu.\n",
    "\n",
    "## Challenge context\n",
    "Synthetic Aperture Radar (SAR) satellite images are used extensively for producing sea ice charts in support of Arctic navigation. However, due to ambiguities in the relationship between SAR backscatter and ice conditions (different ice types and concentrations, as well as different wind conditions over the ocean, have the same backscatter signature), the process of producing ice charts is done by manual interpretation of the satellite data taking into account also the texture patterns of the ice in the SAR images. The process is labour intensive and time-consuming, and thus, the amount of ice charts that are produced on a given day is limited. Automatically generated high-resolution sea ice maps have the potential to increase the use of satellite imagery in ice charting by providing more products and shorter time delays between acquisition and product availability. The design of an automatic and robust sea ice classification scheme has been studied for many years. Recent approaches to this issue that use Convolutional Neural Networks (i.e. image segmentation techniques) show promising results. This challenge aims to create AI systems that can exploit the spatial information of Sentinel-1 SAR images to map sea ice in the Arctic.\n",
    "\n",
    "## Objective\n",
    "Your task will be to map three sea ice parameters, the total Sea Ice Concentration (SIC), the Stage Of Development (SOD), and the floe size (FLOE). To successfully map these sea ice parameters, the AI4Arctic sea ice challenge dataset has been prepared. The AI4Arctic sea ice challenge dataset includes 493 training and 20 testing files in netCDF format. Each file contains: \n",
    "two-channel dual polarized (HH and HV) Sentinel-1 Extra Wide Swath (EW) images, auxiliary Sentinel-1 image parameters, microwave radiometer measurements from the AMSR2 sensor on board the JAXA GCOM-W satellite,\n",
    "several Numerical Weather Prediction (NWP) parameters from the ERA5 reanalysis dataset, the corresponding ice chart based on that Sentinel-1 image, produced by either the Greenland Ice Service at DMI or the Canadian Ice Service (CIS).\n",
    "\n",
    "Two versions of the dataset are available, the raw version and the ready-to-train version, to allow for both full experimentability and to quickly get started. In this notebook, only the ready-to-train version will be examined. The challenge dataset manual contains a more detailed explanation of the dataset. The dataset contains the following data variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78ef20e5-709b-41bc-819b-d052d608108f",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# -- Built-in modules -- #\n",
    "import os\n",
    "os.environ['AI4ARCTIC_DATA'] = ''  # Fill in directory for data location.\n",
    "os.environ['AI4ARCTIC_ENV'] = ''  # Fill in directory for environment with Ai4Arctic get-started package. \n",
    "\n",
    "\n",
    "# -- Third-part modules -- #\n",
    "import xarray as xr\n",
    "\n",
    "scene_name = '20180325T194759_dmi_prep.nc'  # Scene from the challenge dataset manual. Change as you like.\n",
    "scene = xr.open_dataset(os.path.join(os.environ['AI4ARCTIC_DATA'], scene_name))  # Open the scene.\n",
    "scene  # Enable interactive exploration of the scene (below)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdc6899c-827e-423f-8283-29bdcee26a4e",
   "metadata": {},
   "source": [
    "- Sea ice chart parameters\n",
    "    * 'SIC': The total SIC is the percentage ratio of sea ice to open-water for an area, discretised into 11 10%-bin classes ranging from 0% (open-water) to 100% (fully-covered sea ice). \n",
    "    * 'SOD': The SOD can also be viewed as the type of sea ice, which is a proxy for the thickness of the sea ice, i.e. how easy it is to traverse it. The parameter contains 5 classes, 0: Open-water, 1: New ice, 2: Young ice, 3: Thin First-year ice, 4: Thick First-year ice and 5: Old ice, i.e. older than 1 year.\n",
    "    * 'FLOE': The floe size describe how large or continuous the sea ice pieces/chunks are, with 6 parameters: 0: Open-water, 1: Cake ice, 2: Small floe, 3: Medium floe, 4: Big floe, 5: Vast floe, and 6: Bergs, i.e. variants of icebergs and glacier ice.\n",
    "- Geographical information\n",
    "    * 'distance_map': A map over the distance from land for all Sentinel-1 pixels. A nonlinear translation from kilometers to land to an index has been carried out.\n",
    "    * 'sar_2dgrid_latitude': Latitude of subgrid points in the scene (i.e. latitude/longitude per SAR pixel is not available).\n",
    "    * 'sar_2dgrid_longitude': Longitude of subgrid points in the scene.\n",
    "- Synthetic Aperture Radar\n",
    "    * 'nersc_sar_primary': HH Polarization\n",
    "    * 'nersc_sar_secondary': HV Polarization\n",
    "    * 'sar_incidenceangle': inclination of the measurement angle from the instrument to the ground\n",
    "- AMSR2 Passive Microwave Radiometer\n",
    "    * 'btemp_6_9v': 6.9 GHz vertical polarization. Transferred to the Sentinel-1 image geometry.\n",
    "    * 'btemp_6_9h': 6.9 GHz horizontal polarization.\n",
    "    * ..\n",
    "    * 'btemp_89h_0' : 89 GHz horizontal polarization.\n",
    "- ERA5 Environmental Variables\n",
    "    * 'u10m_rotated': ERA5 eastward component of the 10m wind rotated to account for the Sentinel-1 flight direction.\n",
    "    * 'v10m_rotated': ERA5 northward component of the 10m wind rotated to account for the Sentinel-1 flight direction.\n",
    "    * 't2m': ERA5 2m air temperature.\n",
    "    * 'skt': ERA5 skin temperature.\n",
    "    * 'tcwv': ERA5 total column water vapour.\n",
    "    * 'tclw': ERA5 total column cloud liquid water. \n",
    "\n",
    "There is no requirement to use all the data variables.\n",
    "There are three data grids to keep in mind in the dataset. The full grid containing the ice charts, distance map, SAR and auxiliary SAR variables. In connection to the SAR data, there is a grid with Ground Control Point (GCP) e.g. containing geographical location. Finally, there is the subgrid containing the AMSR2 and ERA5 environmental data.\n",
    "\n",
    "## Preprocessing steps.\n",
    "In the raw dataset, ice charts are stored in a 2darray containing ID numbers for polygons with an associated lookup table with ice information for each polygon. A polygon icechart conversion script is attached to do a similar conversion for the raw dataset. As the SOD and FLOE codes are given for partial sea ice concentrations, we utilize a threshold of 65% to indicate when a class is dominant in a polygon.\n",
    "\n",
    "To reduce the barrier of entry and ease of working with the data, the original 40 m pixel spacing ($\\sim$10,000 x 10,000 pixels) in the SAR (and charts etc.) data have been downsampled to 80 m ($\\sim$5,000 x 5,000 pixels). It is also required to deliver icecharts in this pixel spacing. The SAR, distance map and incidence angle have been downsampled using a 2×2 averaging kernel, whereas ice charts are reduced with a 2x2 max kernel. This is followed by aligning the masks (nan-values) across the data (except the subgridded variables). Afterwards, the scenes are standard scaled using the mean and standard deviation of the values for each variable (z = (x - u) / s, where x are original values, u = mean and s = standard deviation). Minimum and maximum, and mean and std values for each variable are also available in the two files 'global_minmax.npy' and 'global_meanstd.npy', respectively.\n",
    "\n",
    "Finally, NaN values in the SAR images are replaced with 0 and 255 in ice charts (stored as the largest value in uint8) to represent non-data or masked pixels. The chart fill value can be used to discount the relevant pixels during the loss calculation. Feel free to change these values or schemes. Below is an illustration of the different dataset parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "14853919-2646-4829-8e7e-b1dfedefd799",
   "metadata": {},
   "source": [
    "# Sea ice charts\n",
    "Manual ice charting from multi-sensor satellite data analysis has for many years been the method applied at the National Ice Centers around the world for producing sea ice information for marine safety. Ice analysts primarily use SAR imagery due to the radar sensor's high resolution and capability to see through clouds and in polar darkness. The ice charts used in the AI4Arctic sea ice challenge dataset are from the Canadian (CIS)- and DMI operational ice services. The ice charts from the two ice services cover sub-regions of the Canadian and Greenland Waters.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "890a7c0e-1b17-433c-91dd-3c640535a959",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# -- Third-part modules -- #\n",
    "import matplotlib as mpl\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "\n",
    "# --Proprietary modules -- #\n",
    "from functions import chart_cbar\n",
    "from utils import CHARTS, GROUP_NAMES, LOOKUP_NAMES, SIC_LOOKUP, SOD_LOOKUP, FLOE_LOOKUP, SCENE_VARIABLES\n",
    "\n",
    "\n",
    "# - Show charts.\n",
    "fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(12, 8))\n",
    "for idx, chart in enumerate(CHARTS):\n",
    "    scene[chart] = scene[chart].astype(float)  # Convert charts from uint8 to float to enable nans in the arrays, \n",
    "    # replace chart fill values with nan for better visualization.\n",
    "    scene[chart].values[scene[chart] == scene[chart].attrs['chart_fill_value']] = np.nan\n",
    "    axs[idx].imshow(scene[chart].values, vmin=0, vmax=LOOKUP_NAMES[chart]['n_classes'] - 2, cmap='jet', interpolation='nearest')\n",
    "    chart_cbar(ax=axs[idx], n_classes=LOOKUP_NAMES[chart]['n_classes'], chart=chart, cmap='jet')  # Creates colorbar with ice class names.\n",
    "\n",
    "plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0.75, hspace=0)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb621043-3e73-4605-b658-99eaec59e8f5",
   "metadata": {},
   "source": [
    "Gaps in the reference chart parameters can result from ice polygons either with no information on the parameter or if there is no dominant class meeting the threshold."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9599a99a-7891-4ede-8163-2984c1e407ef",
   "metadata": {},
   "source": [
    "# Synthetic Aperture Radar data and auxiliary variables\n",
    "Sentinel-1 is the radar mission of the Copernicus Earth Observation programme of the European Union (EU). The Sentinel-1 mission comprises a constellation of two polar-orbiting satellites, Sentinel-1A and Sentinel-1B, sharing the same orbital plane and collecting C-band (4.5 cm wavelength) synthetic aperture radar (SAR) images. Radar images can be acquired regardless of the weather. The Sentinel-1 SAR has the advantage of operating at a wavelength not impeded by cloud cover or a lack of illumination, such as in polar darkness, and can acquire data over a site during day or night time under all weather conditions. Since the Arctic area is dominated by cloud cover and polar darkness for a large part of the year, the SAR instrument has for many years been valuable for Arctic monitoring applications such as sea ice charting.\n",
    "The Sentinel-1 sensor transmits a radar signal towards the ground, and the backscatter is the portion of the outgoing radar signal that the target on the Earth’s surface redirects directly back towards the radar antenna. Some portions of the incident radar signal will be reflected and/or scattered away from the radar or absorbed.\n",
    "\n",
    "The mode and polarization specifications of the Sentinel-1 images in the sea ice dataset are among those that are traditionally used for ice charting; Sentinel-1 Extra Wide Swath Mode (EW) Level-1 Ground Range Detected products in Medium resolution (GRDM) and in dual polarization, HH and HV. These Sentinel-1 image products cover 400 x 400 square kilometres. A Sentinel-1 image in EW mode consists of five sub-swaths. There are some radiometric variations between these sub-swaths, most evident in HV polarization images. Correcting these differences is a complex task. Other phenomena, such as scalloping effects, are also present in some images. The SAR noise correction in this dataset is provided by the NERSC (Korosov et al., 2021).\n",
    "\n",
    "It is common practice to visualize SAR data using a grey colourmap. To improve the contrast, 5 and 95% quantiles are used. The incidence angle and distance map are straightforward."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d9428b22-33ae-42b8-97ec-47426824556b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Show full resolution variables.\n",
    "fig, axs = plt.subplots(nrows=2, ncols=2, figsize=(14,14))\n",
    "for idx, full_variable in enumerate(SCENE_VARIABLES[0:4]):\n",
    "    ax = axs[idx // 2, idx % 2]\n",
    "    if 'nersc_sar' in full_variable:\n",
    "        scene[full_variable].values[scene[full_variable].values == scene[full_variable].attrs['variable_fill_value']] = np.nan\n",
    "        label = 'Scaled backscatter'\n",
    "        im = ax.imshow(scene[full_variable].values, cmap='gray',\n",
    "                       vmin=np.nanquantile(scene[full_variable].values, q=0.025),\n",
    "                       vmax=np.nanquantile(scene[full_variable].values, q=0.975))\n",
    "    elif 'incidenceangle' in full_variable:\n",
    "        scene[full_variable].values[scene[full_variable].values == scene[full_variable].attrs['variable_fill_value']] = np.nan\n",
    "        label = 'Incidence angle'\n",
    "        im = ax.imshow(scene[full_variable].values)\n",
    "    elif 'map' in full_variable:\n",
    "        label = 'Distance to land'\n",
    "        im = ax.imshow(scene[full_variable].values)\n",
    "    plt.colorbar(im, ax=ax, fraction=0.0485, pad=0.049, label=label)\n",
    "    \n",
    "    \n",
    "fig.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bdae9187-1d80-4031-9882-9faeb0122960",
   "metadata": {},
   "source": [
    "# AMSR2 channels\n",
    "For each Sentinel-1 scene, a corresponding AMSR2 part of the netCDF file is produced, containing the AMSR2 brightness temperature pixels that are transferred to the Sentinel-1 image geometry. The AMSR2 swaths are resampled to the coordinates of every 50 x 50 (2 km) Sentinel-1 pixel due to the much coarser resolution of the AMSR2 data. Data in the AMSR2 part of the netCDF file are brightness temperatures for each polarization and frequency available from the AMSR2 sensor.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3710add2-cacb-45fc-8f04-70e1a644f030",
   "metadata": {},
   "outputs": [],
   "source": [
    "# There is no mask in these scene variables.\n",
    "fig, axs = plt.subplots(nrows=4, ncols=4, figsize=(10,10))\n",
    "for idx, amsr2_channel in enumerate(SCENE_VARIABLES[4:17]):\n",
    "    ax = axs[idx // 4, idx % 4]\n",
    "    im = ax.imshow(scene[amsr2_channel])\n",
    "    plt.colorbar(im, ax=ax, fraction=0.0485, pad=0.049, label=amsr2_channel)\n",
    "\n",
    "fig.suptitle('AMSR2 Brightness temperature, 6.9-89 GHz vertical and horizontal polarization', y=0.875)\n",
    "[axs[-1, col].axis('off') for col in range(1, 4)]\n",
    "plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0.9, hspace=-0.6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09e3dfa2-9f53-4957-bc85-18bc36a5b198",
   "metadata": {},
   "source": [
    "#  Environmental variables\n",
    "For each Sentinel-1 scene, a corresponding ERA5 part of the netCDF file is produced, containing several NWP parameters. These parameters are resampled in the same manner as the AMSR2 brightness temperatures. The ERA5 data is retrieved from the ERA5 hourly data on single levels from 1959 to present reanalysis dataset, which is available at the Copernicus Climate Data Store (cds.climate.copernicus.eu). For each Sentinel-1 scene the NWP parameters with the smallest difference in time to the Sentinel-1 acquisition time are retrieved and resampled to the Sentinel-1 pixels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "05676f5a-09a1-4b47-b0f3-6318b400a5f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# There is no mask in these scene variables.\n",
    "fig, axs = plt.subplots(nrows=2, ncols=3, figsize=(10,10))\n",
    "for idx, env_var in enumerate(SCENE_VARIABLES[18:24]):\n",
    "    ax = axs[idx // 3, idx % 3]\n",
    "    im = ax.imshow(scene[env_var])\n",
    "    plt.colorbar(im, ax=ax, fraction=0.0485, pad=0.049, label=env_var + ' ' + scene[env_var].units)\n",
    "\n",
    "fig.suptitle('Environmental variables', y=0.8)\n",
    "plt.subplots_adjust(left=0, bottom=0, right=1, top=1, wspace=0.5, hspace=-0.6)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0766a928-7d79-486f-b0c5-b7e190226f62",
   "metadata": {},
   "source": [
    "# Data statistics\n",
    "In the folder 'misc', statistical information regarding the data is available. Class bins for each chart in every (ready-to-train) scene, as well as means, standard deviations, and the minimum and maximum values of the raw challenge dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e57b556e-f042-4aff-988e-4969523ce81f",
   "metadata": {},
   "outputs": [],
   "source": [
    "global_meanstd = np.load(os.environ['AI4ARCTIC_ENV'] + '/misc/global_meanstd.npy', allow_pickle=True).item()\n",
    "global_minmax = np.load(os.environ['AI4ARCTIC_ENV'] + '/misc/global_minmax.npy', allow_pickle=True).item()\n",
    "\n",
    "for variable in SCENE_VARIABLES:\n",
    "    print(f\"{variable}; min: {global_minmax[variable]['min']:.3f}, mean: {global_meanstd[variable]['mean']:.3f}, \"\\\n",
    "    f\"max: {global_minmax[variable]['max']:.3f}, std: {global_meanstd[variable]['std']:.3f}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
